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Abstract 

We study in this paper the problem of adaptive trajectory tracking control for a class of nonlinear systems 
with parametric uncertainties. We propose to use a modular approach, where we hrst design a robust nonlinear 
state feedback which renders the closed loop input-to-state stable (ISS), where the input is considered to be the 
estimation error of the uncertain parameters, and the state is considered to be the closed-loop output tracking 
error. Next, we augment this robust ISS controller with a model-free learning algorithm to estimate the model 
uncertainties. We implement this method with two different learning approaches. The hrst one is a model-free 
multi-parametric extremum seeking (MBS) method and the second is a Bayesian optimization-based method called 
Gaussian Process Upper Conhdence Bound (GP-UCB). The combination of the ISS feedback and the learning 
algorithms gives a learning-based modular indirect adaptive controller. We show the efficiency of this approach 
on a two-link robot manipulator example. 


I. Introduction 

Classical adaptive methods can be classified into two main approaches; ‘direct approaches’, where the controller 
is updated to adapt to the process, and ‘indirect approaches’, where the model is updated to better reflect the 
actual process. Many adaptive methods have been proposed over the years for linear and nonlinear systems, we 
could not possibly cite here all the design and analysis results that have been reported, instead we refer the 
reader to e.g., [1], [2] and the references therein for more details. Of particular interest to us is the indirect 
modular approach to adaptive nonlinear control, e.g., [2], [3], [4], [5], [6], [7], [8], [9], [10], [11], [12]. In this 
approach, first the controller is designed by assuming that all the parameters are known and then an identifier 
is used to guarantee certain boundedness of the estimation error. The identifier is independent of the designed 
controller and thus the approach is called ‘modular’. For example, a modular approach has been proposed in [3] 
for adaptive neural control of pure-feedback nonlinear systems, where the input-to-state stability (ISS) modularity 
of the controller-estimator is achieved and the closed-loop stability is guaranteed by the small-gain theorem (see 
also [13], [14]). 
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In this work, we build upon this type of modular adaptive design and provide a framework which combines model- 
free learning methods and robust model-based nonlinear control to propose a learning-based modular indirect 
adaptive controller, where model-free learning algorithms are used to estimate, in closed-loop, the uncertain 
parameters of the model. The main difference with the existing model-based indirect adaptive control methods, is 
the fact that we do not use the model to design the uncertainty parameters estimation filters. Indeed, model-based 
indirect adaptive controllers are based on parameters estimators designed using the system’s model, e.g., the 
X-swapping methods presented in [2], where gradient descent filters obtained using the systems dynamics are 
designed to estimate the uncertain parameters. We argue that because we do not use the system’s dynamics to 
design uncertainties estimation filters we have less restrictions on the type of uncertainties that we can estimate, 
e.g., uncertainties appearing nonlinearly can be estimated with the proposed approach, see [5] for some earlier 
results on a mechatronics application. We also show here that with the proposed approach we can estimate at the 
same time a vector of linearly dependent uncertainties, a case which cannot be straightforwardly solved using 
model-based filters, e.g., refer to [15] where it is shown that the X-swapping model-based method fails to estimate 
a vector of linearly dependent model coefficients. In this work, we implement the proposed approach with two 
different model-free learning algorithms: The first one is a dither-based MBS algorithm, and the second one is 
a Bayesian optimization-based method called GP-UCB. The latter solves the exploration-exploitation problem in 
the continuous armed bandit problem, which is a non-associative reinforcement learning (RL) setting. Indeed, 
MBS is a model-free control approach with well known convergence properties, since it has been analyzed in 
many papers, e.g., [16], [17], [16], [18], [19]. This makes MBS a good candidate for the model-free estimation 
part of our modular adaptive controller, as already shown in some of our preliminary results in [7], [8], [10]. 
However, one of the main limitations with dither-based MBS is the convergence to local minima. To improve 
this part of the controller, we introduce here another model-free learning algorithm in the estimation part of the 
adaptive controller. Indeed, we propose in this paper to use a reinforcement learning algorithm based on Bayesian 
optimization methods, known as GP-UCB, e.g., [20], which contrary to the MBS algorithm is guaranteed to reach 
the global minima in a finite search space. 

One point worth mentioning at this stage is that comparatively to‘pure’ model-free controllers, e.g., pure MBS 
or model-free RB algorithms, the proposed control here has a different goal. Indeed, the available model-free 
controllers are meant for output or state regulation, i.e., solving a static optimization problem. In the contrary, 
here we propose to use model-free learning to complement a model-based nonlinear control to estimate the 
unknown parameters of the model, which means that the control goal, i.e., state or output trajectory tracking is 
handled by the model-based controller. The learning algorithm is used to improve the tracking performance of 
the model-based controller, and once the learning algorithm has converged, one can carry on using the nonlinear 
model-based feedback controller alone, i.e., without the need of the learning algorithm. Burthermore, due to the 


fact that we are merging together a model-hased control with a model-free learning algorithm, we believe that this 
type of controller can converge faster to an optimal performance, comparatively to the pure model-free controller, 
since hy ‘partly’ using a model-hased controller, we are taking advantage of the partial information given hy the 
physics of the system, whereas the pure model-free algorithms assume no knowledge about the system, and thus 
start the search for an optimal control signal from scratch. 

Similar ideas of merging model-based control and MES has been proposed in [12], [21], [22], [4], [5], [6], [7], 
[8], [10]. For instance in [12], [21] extremum seeking is used to complement a model-based controller, under 
linearity of the model assumption in [12] (in the direct adaptive control setting, where the controllers gains 
are estimated), or in the indirect adaptive control setting, under the assumption of linear parametrization of the 
control in terms of the uncertainties in [21]. The modular design idea of using a model-based controller with ISS 
guaranty, complemented with an MES-based module can be found in [5], [6], [7], [8], [10], where the MES was 
used to estimate the model parameters and in [4], [23], where feedback gains were tuned using MES algorithms. 
The work of this paper falls in this class of ISS-based modular indirect adaptive controllers. The difference with 
other MES-based adaptive controllers is that, due to the ISS modular design we can use any model-free learning 
algorithm to estimate the model uncertainties, not necessarily extremum seeking-based. To emphasis this we show 
here the performance of the controller when using a type of RL-based learning algorithm as well. 

The rest of the paper is organized as follows. In Section|ni we present some notations, and fundamental definitions 
that will be needed in the sequel. In Section JIIJ we formulate the problem. The nominal controller design are 
presented in Section|IVl In Section ITV-Bl a robust controller is designed which guarantees ISS from the estimation 
error input to the tracking error state. In Section ITV-Cl the ISS controller is complemented with an MES algorithm 
to estimate the model parametric uncertainties. In section IIV-DI we introduce the RE GP-UCB algorithm as a 
model-free learning to complement the ISS controller. Section |V] is dedicated to an application example and the 
paper conclusion is given in Section |Vll 

II. Preliminaries 

Throughout the paper, we use || • || to denote the Euclidean norm; i.e., for a vector x G we have ||x|| = 
||x ||2 = y/x'^x, where x^ denotes the transpose of the vector x. We denote by Card(S') the size of a finite set 
S. The Frobenius norm of a matrix A G with elements Uij, is defined as ||^||f — 

Given x G R”^, the signum function is defined as sign(a:) = [sign(a:i), sign(x 2 ), ••• , sign(xm)]^, where 
sign{.) denotes the classical signum function. We use / to denote the time derivative of / and for the 

r-th derivative of f{t), i.e. We denote by C^, functions that are k times differentiable and by C°°, a 

smooth function. A continuous function a : [0, a) —[0, oo) is said to belong to class /C if it is strictly increasing 
and q;( 0) = 0. It is said to belong to class /Coo if a = oo and a{r) —)• oo as r —oo [24]. A continuous function 
f3 : [0, a) X [0, oo) —)■ [0, oo) is said to belong to class ICC if, for a fixed s, fhe mapping /3(r, s) belongs to class 










1C with respect to r and, for each fixed r, the mapping /3(r, s) is decreasing with respect to s and /3(r, s) ^ 0 
as s —>• oo [24]. 

Next, We introduce some definitions that will he used in the sequel, e.g. [24]: Consider the system 


X = f{t,x,u) (1) 

where / : [0, oo) x M” x —)• M” is piecewise continuous in t and locally Lipschitz in x and u, uniformly in 

t. The input u{t) is piecewise continuous, hounded function of t for all t > 0. 

Definition 1 ([24], [25]): The system ([Hi is said to he input-to-sate stable (ISS) if there exist a class ICC 
function (5 and a class K, function 7 such that for any initial state a;(to) and any hounded input u{t), the solution 
x{t) exists for all t > to and satisfies 


||a;(f)|| </3(||x(fo)||,f - to)+7( sup ||u(t)||). 

tQ<r<t 

Theorem 1 ([24], [25]): Let V : [0, 00) X R” —)■ R be a continuously differentiable function such that 


ai(lkll) <V{t,x) < a 2 (lk||) 

^+ ^f{t,x,u) <-W{x), V||x|| > p(||u||) > 0 (2) 

for all {t,x,u) G [0, 00 ) x R"^ x R™, where ai, are class /Cqo functions, p is a class /C function, and W{x) 
is a continuous positive definite function on R"^. Then, the system ([T|) is input-to-state stable (ISS). 

Remark 1: Note that other equivalent definitions for ISS have been given in [25, pp. 1974-1975]. For instance. 
Theorem [T] holds if inequality (|2l) is replaced by 

where p G /Coo fl and Cl G /Coo- 


III. Problem Formulation 

A. Nonlinear system model 

We consider here affine uncertain nonlinear systems of the form 

X = f(x) + Af(t,x)+g(x)u, 

y = Hx), 

where x G R"', u G R^, y G R”* (j> > m), represent the state, the input and the controlled output vectors, 
respectively. Af(t, x) is a vector field representing additive model uncertainties. The vector fields /, A/, columns 
of g and function h satisfy the following assumptions. 

Assumption Al The function / : R” — >• R” and the columns of p : R” — )• R^ are C°° vector fields on a bounded 
set X of R"’ and h : R" —)■ R”^ is a vector on X. The vector field A/(x) is on X. 


Assumption A2 System (l3]l has a well-defined (vector) relative degree {ri, r 2 , • • • , r^} at each point € X, 
and the system is linearizahle, i.e., 

Assumption A3 The desired output trajectories (1 < t < m) are smooth functions of time, relating desired 
initial points yid{0) at t = 0 to desired final points yid{tf)ait = tf. 

B. Control objectives 

Our ohjective is to design a state feedhack adaptive controller such that the output tracking error is uniformly 
hounded, whereas the tracking error upper-hound is function of the uncertain parameters estimation error, which 
can he decreased hy the model-free learning. We stress here that the goal of learning algorithm is not stabilization 
hut rather performance optimization, i.e., the learning improves the parameters estimation error, which in turn 
improves the output tracking error. To achieve this control ohjective, we proceed as follows: First, we design 
a robust controller which can guarantee input-to-state stability (ISS) of the tracking error dynamics w.r.t the 
estimation errors input. Then, we combine this controller with a model-free learning algorithm to iteratively 
estimate the uncertain parameters, by optimizing online a desired learning cost function. 

IV. Adaptive Controller Design 

A. Nominal Controller 

Let us first consider the system under nominal conditions, i.e., when Af{t,x) = 0. In this case, it is well 
know, e.g., [24], that system ([3]) can be written as 

= bm)+MmMt), ( 4 ) 

where 

m = leit), •••, ( 5 ) 

C{t) = [yi{t), • • • , l<f<m 

The functions b{^), A(^) can be written as functions of /, g and h, and A(^) is non-singular in X, where X 
is the image of the set of X by the diffeomorphism x —^ between the states of system ([3]) and the linearized 
model (lU). Now, to deal with the uncertain model, we first need to introduce one more assumption on system 
©. 

Assumption A4 The additive uncertainties Af{t,x) in ([31) appear as additive uncertainties in the input-output 
linearized model (H))-® as follows (see also [26]) 

yM(i) = b{^{t)) + A{^{t))u{t) + Ab{t,^{t)), (6) 

where Ab{t, is w.r.t. the state vector ^ £ X. 


Remark 2: Assumption IA41 can be ensured under the so-called matching conditions ([27], p. 146). 

It is well known that the nominal model (|4ll can be easily transformed into a linear input-output mapping. 
Indeed, we can first define a virfual input vector v{t) as 

v{t) = b{i{t)) + A{i{t))u{t). (7) 

Combining ([Hi and ([7]), we can obtain the following input-output mapping 

= v{t). ( 8 ) 

Based on the linear system ([ 8 ]l, it is straightforward to design a stabilizing controller for the nominal system dUl 
as 


Un = A [Vs{t,0 - K^)] : 


(9) 


where is a m x 1 vector and the i-th (1 < i < m) element Vgi is given by 

Vsi = ytd^ - - Kl{yi - yid). (10) 

If we denote the tracking error as ej(f) = yi{t) — yid{t), we obtain the following tracking error dynamics 

+ ■ ■ ■ + Kleiit) = 0, (11) 

where i S {1, 2, ••• , m}. By properly selecting the gains Kj where i S {1, 2, ••• , m} and j £ 
{1,2, • • • , Tj}, we can obtain global asymptotic stability of the tracking errors ej(t). To formalize this condition, 
we add the following assumption. 


Assumption AS There exists a non-empty set A where Kj ^ A such that the polynomials in (ITTI) are Hurwitz, 
where i £ {1, 2, • • • , m} and j £ {1, 2, • • • , rj}. 

To this end, we define z = [z^, z‘^, • • • , where z* = [e^, e'j, • • • , and f £ {1, 2, • • • , m}. 

Then, from (1111) . we can obtain 

i = Az, 


where A £ is a diagonal block matrix given by 

A = diagjAi, A2, • • • , Am}, 


and Aj (1 < f < m) is a Tj X Tj matrix given by 


0 1 
0 



-Kh 



( 12 ) 


1 




As discussed above, the gains Kj can be chosen such that the matrix A is Hurwitz. Thus, there exists a positive 
definite matrix P > 0 such that (see e.g. [24]) 

A^P + PA = -L (13) 

In the next section, we build upon the nominal controller ([Hi to write a robust ISS controller. 


B. Lyapunov reconstruction-based ISS Controller 

We now consider the uncertain model (l3]l, i.e., when Af{t,x) / 0. The corresponding exact linearized model 
is given by ® where / 0. The global asymptotic stability of the error dynamics (fTTI) cannot be 

guaranteed anymore due to the additive uncertainty Ab{t,^{t)). We use Lyapunov reconstruction techniques to 
design a new controller so that the tracking error is guaranteed to be bounded given that the estimate error of 
A6(f,^(f)) is bounded. The new controller for the uncertain model ® is defined as 


Uf = Un + Ur, (14) 

where fhe nominal controller Un is given by @ and the robust controller Ur will be given later. By using the 

controller ([T4]) . and ® we obtain 

yM(f) = bm)+Mm)uf+m,m), 

= KAt)) + A{^{t))Un + A{^{t))Ur + Ab{t,^{t)), 

= Vs{tA) + (15) 

where ([TS]) holds from (|9ll. Which leads to the following error dynamics 

i = Az + B6, (16) 

where A is defined in ([T^ . <5 is a m x 1 vector given by 


6 = A(^(t))ur + Ab(t,^(t)), 


and the matrix P E is given by 


B 


Bi 

B2 

Bm 


where each Bi (1 < i < m) is given by a r* x m matrix such that 


= < 


(17) 


( 18 ) 


1 for I = ri, q = i 

0 otherwise. 





If we choose V{z) = z^Pz as a Lyapunov function for the dynamics (O, where P is the solution of the 
Lyapunov equation ([T3l) . we obtain 


V{t) 


dV . 

z^{A^P + PA)z + 2z^PB6, 
- \\zf+ 2z'^PB6, 


(19) 


where 6 given hy (fTVl) depends on the robust controller Ur- 

Next, we design the controller Ur based on the form of the uncertainties More specifically, we 

consider here the case when is of the following form 


Ab{t,at)) = EQ{C,t), 


( 20 ) 


where E G is a matrix of unknown constant parameters, and Q{C, t) : M"' x M —)• M"* is a known bounded 

function of sates and time variables. For notational convenience, we denote by E{t) the estimate of E, and by 
ce = E — E, the estimate error. We define fhe unknown parameter vector A = [£'(1,1),..., E{m, m)]^ G 
i.e., concatenation of all elements of E, its estimate is denoted by A{t) = [.E(l, 1),..., .E(m, m)]^, and the 
estimation error vector is given by eA(f) = A — A(t). 


Next, we propose the following robust controller 

Ur = - A-\C)[B^Pz\\Q{C,t)f + E{t)Q{C,t)]. (21) 

The closed-loop error dynamics can be written as 

i = f{t,z,eA), (22) 

where e^it) is considered to be an input to the system (1221) . 

Theorem 2: Consider the system under Assumptions I A1 HAS 1 where Ab{t,^{t)) satisfies (l20l ). If we apply 

fo (l3]l fhe feedback confroller (fT4l) . where Un is given by @ and Ur is given by (|2T]) . Then, fhe closed-loop 

system (l22l) is ISS from the estimation errors input e^it) G to the tracking errors state z{t) G M”. 

Proof: By substitution (1211) into (flTI) . we obtain 

,5 = - B'^Pz\\Q{ip)f - E{t) Q{C,t) + Ab{t,m) 

= - B^Pz\m, t)f - E{t) Q{i, t)+E Q{i, t), 

If we consider V(z) = z'^Pz as a Lyapunov function for the error dynamics (fl^ . Then, from ([T^ . we obtain 

1> < - ||zf + 2z'^PBE Q{^, t) - 2z'^PBE{t) Q{^, t) 




which leads to 


^< - \\zf+ 2z'^PBeEQ{^,t)-2\\z'^P^‘^\\Q{^,t)f. 

Since z^PBeEQiO < \\z^PBeEQ{Cl\\ < lk^^-S||||eE||F||Q(OII = ||eA||||Q(OII> we obtain 

l/< -||zf+ 2||^^PB||||eA||||Q(C,i)||-2||z^PBf||Q(e,t)f 

< - ||2p - 2(||2’'PB||||Q({,i)|| - i||ei||)2 + i||eif 

< -Pf+ l||eAll". 

Thus, we have the following relation 

V < Vpll > IIcaII > 0, 

Then from the Lyapunov direct theorem in [24], [25], we obtain that system (l22l) is ISS from input ca to state 

z. 


C. MES-based parametric uncertainties estimation 
Let us define now the following cost function 

J(A) = Fiz{A)), (23) 

where F : R"' —)• R, F{0) = 0, F{z) > 0 for 2 ; S R” — {0}. We need the following assumptions on J. 
Assumption A6 The cost function J has a local minimum at A* = A. 

Assumption A7 The initial error eA(fo) is sufficiently small, i.e., the original parameter estimate vector A are 
close enough to the actual parameter vector A. 

Assumption A8 The cost function J is analytic and its variation with respect to the uncertain parameters is 
bounded in the neighborhood of A*, i.e., ||^(A)|| < ^ 2 , ^2 > 0, A € V(A*), where V(A*) denotes a compact 
neighborhood of A*. 

We can now present the following result. 

Lemma 3: Consider the system (l3]l, under Assumptions IA1HA81 where the uncertainty is given by (l20l) . If we 
apply to (l3]l the feedback controller ([T4l) . where Un is given by ®, Ur is given by ([2T]) . the cost function is given 
by (I 23 ] ). and A(f) are estimated through the ES algorithm 

TT 

Xi = aisin(a;if +-)J(A), Oj > 0, 

Ai{t) = Xi + aism.{ujit z e {1, 2,... , m^} 


(24) 




with oji / ujj, uji + ujj / Wfc, i, j, A: G {1, 2,, m?}, and uji > a;*, V i G {1, 2,... , m?}, with a;* large enough. 
Then, the norm of the error vector z{t) admits the following hound 

\\z{t)\\ < /3(||z(0)||,f) +7(/3(||eA(0)||,f) + ||eA||max), 

where ||eA||max = + \IYa!=i of - 6 > 0, wq = maxjgm2,...,m2} /3 G K,C, /3 G /C£ and 7 G KL. 

Proof: Based on Theorem |2l we know that the tracking error dynamics (l22l) is ISS from the input e^{t) to 
the state z{t). Thus, hy Definition [TJ there exist a class ICC function /3 and a class fC function 7 such that for 
any initial state z(0), any hounded input e/\{t) and any f > 0, 

ll^wil </3(||2(0)||,f)+7( sup ||eA(r)||). (25) 

0<T<t 

Now, we need to evaluate the hound on the estimation vector A(t), to do so we use the results presented 
in [17]. First, based on Assumption IA81 the cost function is locally Lipschitz, i.e. there exists r/i > 0 such 
that |J(Ai) — J(A2)| < 7 i||Ai — A 2II, for all Ai, A2 G V(A*). Furthermore, since J is analytic, it can he 
approximated locally in V(A*) hy a quadratic function, e.g. Taylor series up to the second order. Based on this 
and on Assumptions IA6I and IA71 we can obtain the following bound ([17, p. 436-437],[28]) 

l|ei(*)ll - ll<i(*)ll < INaW - <i(*)ll < /9(l|ei(0).i||) + 

Wo 

where /3 G ICC, > 0, f > 0, wq = nraxjgji 2,...,m2} and d{t) = [oi sin(wif + ^),..., sin(wm2f + f )]^. 
We can further obtain that 

l|eA(()ll< «l|eA(0),(||) + fi + ||<i(()ll 

m? 

E4. 

Together with (1251) yields the desired result. 

Remark 3: The adaptive controller of Lemma [3] uses the ES algorithm (l24b to estimate the model parametric 
uncertainties. One might ask the question: where is the famous persistence of excitation (PE) condition here ? The 
answer can be found in the examination of equation (l24l) . Indeed, the ES algorithm uses as ‘input’ the sinusoidal 
signals a* sin(wif + ^) which clearly satisfy the PE condition. The main difference with classical adaptive control 
result, is that these excitation signals are not entering the system dynamics directly, but instead are applied as 
inputs to the ES algorithm, reflected on the ES estimations outputs and thus transmitted to the system through 
the feedback loop. 

As we mentioned earlier, the dither-based MES has a problem of local minima, to improve this point we propose 
in the next section to use GP-UCB as the model-free learning algorithm for model uncertainties estimation. 


< 


eA(0),f||) -f 


4 

Wo 


-f 


1 





D. GP-UCB based parametric uncertainties estimation 


In this section we propose to use Gaussian Process Upper Confidence Bound (GP-UCB) algorithm to find fhe 
uncerfain paramefer A vector [20], [29]. GP-UCB is a Bayesian optimization algorifhm for sfochasfic optimization, 
i.e., fhe fask of finding fhe global optimum of an unknown function when fhe evaluafions are potentially 
confaminafed wifh noise. The underlying working assumpfion for Bayesian optimization algorifhms, including 
GP-UCB, is fhaf fhe funcfion evaluation is cosily, so we would like lo minimize fhe number of evaluafions 
while having as accurafe esfimale of fhe minimizer (or maximizer) as possible [30]. For GP-UCB, Ihis goal is 
guaranfeed by having an upper bound on fhe regrel of fhe algorifhm - to be defined precisely lafer. 

One difficully of slochaslic oplimizalion is lhal since we only observe noisy samples from fhe funcfion, we 
cannol really be sure aboul fhe exacl value of a function af any given poinl. One may fry to query a single 
poinl many times in order to have an accurate estimate of fhe funcfion. This, however, may lead to excessive 
number of samples, and can be wasteful way of assigning samples when fhe Irue value of fhe funcfion al 
lhal poinl is acfually far from optimal. The Upper Confidence Bounds (UCB) family of algorifhms provide 
a principled approach to guide fhe search [31]. These algorifhms, which are nol necessarily formulafed in a 
Bayesian framework, aulomalically balance fhe exploralion (i.e., finding regions of fhe parameter space lhal 
might be promising) and fhe exploration (i.e., focusing on fhe regions fhaf are known to be fhe besl based on 
fhe current available knowledge) using fhe principle of optimism in fhe face of uncerfainfy. These algorifhms 
oflen come wifh slrong Iheorelical guaranlee aboul Iheir performance. For more informalion aboul fhe UCB class 
of algorifhms, refer to [32], [33], [34]. GP-UCB is a particular UCB algorifhms lhal is suilable to deal wifh 
continuous domains. If uses a Gaussian Process (GP) to mainlain fhe mean and confidence informalion aboul fhe 
unknown function. 

We briefly discuss GP-UCB in our confexl following fhe discussion of fhe original papers [20], [29]. Consider 
fhe cosl funcfion J : U —M to be minimized. This function depends on fhe dynamics of fhe closed-loop sysfem, 
which ilself depends on fhe paramelers A used in fhe conlroller design. So we may consider if as an unknown 
function of A. 

For fhe momenl, lef us assume fhaf J is a function sampled from a Gaussian Process (GP) [35]. Recall 
lhal a GP is a slochaslic process indexed by fhe sel D lhal has fhe properly lhal for any finite subsel of 
fhe evaluation poinls, lhal is {Ai, A2 ,..., A^} C D, fhe joinl dislribulion of ^J(Ai)y is a multivariate 

and ils covariance function (or kernel) 
. The kernel K of a GP determines fhe 


Gaussian dislribulion. GP is defined by a mean function /i(A) = E J(A) 
k(A, A') = Cov(J(A), J(A')) = E [(j(A) - ^(A)) (j(A') - ^(A'))' 

behavior of a typical function sampled from fhe GP. For inslance, if we choose k(A, A') = exp f — MnAJL 


2Z2 


fhe squared exponenlial kernel wifh lenglh scale f > 0, if implies lhal fhe fhe GP is mean square differentiable 
of all orders. We write J ~ GP(/x, k). 






Let us first briefly describe bow we can find the posterior distribution of a GP(0, k); a GP with zero prior mean. 
Suppose that for Aj_i = {Ai, A2 ,..., Ai_i} C D, we have observed the noisy evaluation m = J{Ai) + rn 
with rji ~ A^(0, cr^) being i.i.d. Gaussian noise. We can find the posterior mean and variance for a new point 
A* £ D as follows: Denote the vector of observed values by yt-i = [yi, ■ ■ ■ and define the 

Grammian matrix K £ with [K]ij = K(Ai, Aj), and the vector K* = [k(Ai, A*),..., K(Ai_i, A*)]. 

The expected mean pt(A*) and the variance crt(A*) of the posterior of the GP evaluated at A* are (cf. Section 
2.2 of [35]) 


fit{A*) = K,[K + a^l] \t-i, 
aUA*) = K{A*,A*)-Kj [K + a^~\,. 


At round t, the GP-UCB algorithm selects the next query point At by solving the following optimization 
problertll : 


At ^ argminpt-i(A) - /3t'^^crt-i(A). (26) 

AgD 

Where /3t depends on the choice of kernel among other parameters of the problem. 

The optimization problem (l26l ) is often nonlinear and non-convex. Nonetheless solving it only requires querying 
the GP, which in general is much faster than querying the original dynamical system. This is important when the 
dynamical system is a physical system and we would like to minimize the number of interactions with it before 
finding a A with small J(A). One practically easy way to approximately solve (l26l ) is to restrict the search to a 
finite subset D' of D. The finite subset can be a uniform grid structure over D, or it might consist of randomly 
selected members of D. 

The theoretical guarantee for GP-UCB is in the form of regret upper bound. Let us define A* ■(— argmin^g^, -^(A), 
the global minimizer of the objective function. The regret at time t is defined by rt = J{At) — J{A*). This is 
a measure of sub-optimality of the choice of A^ according the cost function J. The cumulative regret at time T 
is defined as Rt = U- Ideally we would like lim^^oo -^ = 0. 

The behavior of the cumulative regret Rt depends on the set D and the choice of kernel. If we fix the 
confidence parameter 5 > 0, for the squared exponential kernel, the asymptotic behavior of Rt is 

O (^^T[log‘^+\T) + log{l/5)]y 

with probability at least 1 — 5 (cf. Theorem 3 of [20], [29]). This result does not even require the function J to 
be a GP. It only requires the function to have a finite norm in the reproducing kernel Hilbert space (RKHS) 3 (k 
defined by the kernel K. 

*UCB algorithms are often formulated as maximization problems, so the “upper” confidence bound is calculated. Here we actually 
compute the “lower” confidence bound, but to keep the naming convection, we still GP-UCB instead of GP-LCB. 





Remark 4: One main difference with some of the existing model-hased adaptive controllers, is the fact that 
the leaming-hased estimation algorithm used here does not depend on the model of the system, i.e., the only 
information needed to compute the learning cost function (|2^ is the desired trajectory and the measured output 
of the system (please refer to Section |V] for an example). This makes the learning-hased adaptive controllers 
suitable for the general case of nonlinear parametric uncertainties. For example in [36], a similar preliminary 
algorithm has been tested in the case of nonlinear models of electromagnetic actuators with a nonlinear parametric 
uncertainty. Another point worth mentioning here, is the fact that with the available modular model-based adaptive 
controllers, like the X-swapping modular algorithms, e.g., [37], it is not possible in some cases to estimate 
multiple uncertainties simultaneously. For instance, it is shown in [15] that the X-swapping adaptive control 
cannot estimate multiple uncertainties in the case of electromagnetic actuators, due to the linear dependency 
of the uncertain parameters, i.e., when we consider three parametric uncertainties affecting the same output 
acceleration, in which case the model-based estimation filters cannot distinguish between the uncertainties from 
this acceleration. Flowever, when dealing with the same example, the MES-based modular indirect adaptive control 
approach was successful in estimating multiple uncertainties at the same time [28]. A similarly challenging case 
is considered in the example presented in the next section. 


V. Two-link Manipulator Example 

We consider here a two-link robot manipulator, with the following dynamics (see e.g. [38]) 

H{q)q + C{q,q)q + G{q) = T, (27) 

where q = [qi,q 2 ]'^ denotes the two joint angles and r = [Ti,r2]^ denotes the two joint torques. The matrix 
H S is assumed to be non-singular and its elements are given by 

Hu = miil^ +Ii+ m2[ii + + 27i 42 cos(q2)] + h, 

(28) 


Hi 2 = m2iiic2 cos(g2) + m2i% + h, 


H 21 = Hi2, 

H 22 = m2il^+l2- 

The matrix C{q, q) is given by 

-hq2 -hcji - hq2 
hqi 0 

where h = m 2 iiic 2 sin(g2)- The vector G = [Gi,G2]^ is given by 

Gi = miic^gcos{qi)+m2g[i2COs{qi+q2)+iicos{qi)], 
G2 = m 2 ic 2 gcos{qi + 52 ), 


G{q,q) = 


(29) 


where, ii, £2 are the lengths of the first and second link, respectively, ic 


are the distances between the 


rotation center and the center of mass of the first and second link respectively, mi, m2 are the masses of the 




first and second link, respectively, Ii is the moment of inertia of the first link and I 2 the moment of inertia of 
the second link, respectively, and g denotes the earth gravitational constant. 

In our simulations, we assume that the parameters take the following values: h = ^ kg ■ m?, mi = 10.5 kg, 
m 2 = 5.5 kg, £i = 1.1 m, £2 = 1.1 m, £ci = 0.5 m, ^2 = 0.5 m, h = ^ kg ■ m?, g = 9.8 mjs^. The system 
dynamics (ITT] ) can he rewritten as 

q = H-\q)T-H-\q)[C{q,q)q + Giq)]. (30) 

Thus, the nominal controller is given hy 

Tn = [C{q,q)q + G{q)] 

+ H{q) [q'd - Kd{q - qd) - Kp{q - qd )], (3i) 

where q^ = [qij^, denotes the desired trajectory and the diagonal gain matrices Kp > 0, > 0, are chosen 

such that the linear error dynamics (as in (fTTI) ') are asymptotically stable. We choose as output references the 
5th order polynomials qiref{t) = q 2 ref{t) = where the Uj’s have been computed to satisfy the 

boundary constraints qiref{0) = 0,qiref{tf) = qf,qiref{0) = qiref{tf) = 0,qiref{0) = qirefitf) = 0, i = 1,2, 
with tf = 2 sec, qj = 1.5 rad. In these tests, we assume that the nonlinear model (ITTl) is uncertain. In particular, 
we assume that there exist additive uncertainties in the model (l30l) . i.e., 

q = H-\q)T - H-\q) [G{q, q)q + G{q)] - E G{q). (32) 

Where, is a matrix of constant uncertain parameters. Following (|2T]) . the robust-part of the control writes as 

Tr = -H{B^Pz\\Gf -EG{q)), (33) 

0 0 
0 1 

0 0 

0 0 

0 1 

-K^P -Kl _ 

z = [qi — qid, qi — qid, <?2 — q 2 d, <12 — and E is the matrix of the parameters’ estimates. Eventually, the 

final feedback controller writes as 

T = Tn + Tr. (34) 


where 


B^ = 


0 1 
0 0 


P is solution of the Lyapunov equation (fT3l) . with 

0 1 


A = 


-Kl -Kj 


0 

0 


We consider here the challenging case where the uncertain parameters are linearly dependent. In this case the 
uncertainties’ ‘effect’ is not observable from the measured output (see Remark ID). Indeed, in the case where the 






uncertainties enter the model in a linearly dependent function, e.g. when the matric A has only one non-zero line, 
some of the classical available modular model-hased adaptive controllers, like for instance X-swapping controllers, 
cannot he used to estimate all the uncertain parameters simultaneously. For example, it has been shown in [15], 
that the model-based gradient descent filters failed to estimate simultaneously multiple parameters in the case of 
the electromagnetic actuators example. For instance, in comparison with the ES-based indirect adaptive controller 
of [21], the modular approach does not rely on the parameters mutual exhaustive assumption, i.e., each element 
of the control vector needs to be linearly dependent on at least one element of the uncertainties vector. More 
specifically, we consider here fhe following case: A(l,l) = 0.3, A(l,2) = 0.6, and A(2,f) = 0, i = 1,2. 
In fhis case, fhe uncertainties’ effect on the acceleration qi cannot be differentiated, and thus the application 
of the model-based X-swapping method to estimate the actual values of both uncertainties at the same time is 
challenging. Similarly, the method of [21], cannot be readily applied because the second control T 2 is not linearly 
depend on the uncertainties, which only affects ri. However, we show next that, by using the modular ISS-based 
controller, we manage to estimate the actual values of the uncertainties simultaneously and improve the tracking 
performance. 


A. MES-based uncertainties estimation 

The estimates of the two parameters Aj (i = 1, 2) are computed using a discrete version of (I24I) . given by 




Xi{k + 1) = Xi{k) + aitfsm{ujitfk +—)J{A), 

TT 

Aj(A:-|-l) = Xi{k + 1) + aism{ujitfk -i = 1,2 


(35) 


(36) 


where. A: € N denotes the iteration index, Xj(0) = Aj(0) = 0. We choose the following learning cost function 

J(A) = X;^(g(A) - - Qd{t))dt 

+ /o^(?(^) - Qdit))^Q 2 {q{A) - qd{t))dt, 
where Qi > 0 and Q 2 > 0 denote the weight matrices. We implement the learning parameters: ai = 0.1, 
02 = 0.05, ioi = 7 radjsec, 002 = radjsec. The obtained performance cost function is displayed on Figure 
|l(a)[ where we see that the performance improves over the learning iterations. The corresponding parameters 
estimation profiles are reporfed on Figures |l(b) and 1(c) which show a quick convergence of fhe firsf estimates 
Ai to a neighborhood of the actual value. The convergence of the second estimates A2 is slower, which is 
expected from the ES algorithms when many parameters are estimated at the same time. One has to underline 
here, however, that the convergence speed of the estimates and the excursion around their final mean values, 
can be direcfly fine-tuned by the proper choice of the learning coefficients a*, lOi, f = 1,2 in equation ([35] ). 
Einally, The tracking performance is shown on Eigures |2(a)[ |2(b)[ where we can see that, after learning the actual 
values of the uncertainties, the tracking of the desired trajectories is recovered. We only show the first angular 
trajectories here, because the uncertainties affect directly only the acceleration qi, and their effect on the tracking 
for the second angular variable is negligible. 
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(a) Cost function over the learning iterations (MES) 
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(b) Estimate of Ai over the learning iterations (MES) 



(c) Estimate of A 2 over the learning iterations (MES) 


Fig. 1. Cost function and uncertainties estimates- MES algorithm 


B. GP UCB-based uncertainties estimation 

In this section, to show that the modular ISS-hased controller is independent of the choice of the learning 
algorithm, we apply the GP-UCB learning algorithm-hased estimator to the same two-links manipulator example. 
We apply the algorithm IIV-DI with the following parameters: a = 0.1, I = 0.2, and Pt = ^ ), with 

(5 = 0.05. 

We test the GP-UCB algorithm under the same conditions as in the previous section. The obtained parameters 
and tracking results are reported on figures |3(a)[ |3(h)[ |3(c)[ |4(a)[ |4(h)[ We can see on these figures that similar 




































(a) Obtained vs. desired first angular trajectory (MES) 



(b) Obtained vs. desired first angular velocity trajectory (MES) 
Fig. 2. Obtained vs. desired trajectories (MES) 


to the MES-based adaptive controller, the uncertainties are well estimated. One could argue that they are better 
estimated with the GP-UCB algorithm because there is no permanent dither signal, which leads to permanent 
oscillations in the MES-based learning. The tracking performance is clearly improved in this case as well, due 
to the precise estimation of the parameters. 
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(a) Cost function over the learning iterations (GP-UCB) 



(b) Estimate of Ai over the learning iterations (GP-UCB) 



(c) Estimate of A 2 over the learning iterations (GP-UCB) 

Fig. 3. Cost function and uncertainties estimates- (GP-UCB) algorithm 






































































(a) Obtained vs. desired first angular trajectory (GP-UCB) 



(b) Obtained vs. desired first angular velocity trajectory (GPU-CB) 
Fig. 4. Obtained vs. desired trajectories (GPU-CB) 


VI. Conclusion 

We have studied the prohlem of adaptive control for nonlinear systems which are affine in the control with 
parametric uncertainties. For this class of systems, we have proposed the following controller: We use a modular 
approach, where we first design a robust nonlinear controller, designed based on the model (assuming knowledge 
of the uncertain parameters), and then complement this controller with an estimation module to estimate the actual 
values of the uncertain parameters. This type of modular approaches are certainly not new, e.g., the X-swapping 
methods. However, the novelty here is that the estimation module that we propose is based on model-free learning 
algorithms. Indeed, we propose to use two learning algorithms, namely, a multi-parametric extremum seeking 
algorithm, and a GP-UCB algorithm, to learn in realtime the uncertainties of the model. We call the learning 
approach ‘model-free’ for the simple reason that it only requires to measure an output signal from the system 
and compare it to a desired reference signal (independent of the model), to learn the best estimates of the 
















model uncertainties. We have guaranteed the stability (while learning) of the proposed approach, hy ensuring 
that the model-hased robust controller, leads to an ISS results, which guarantees boundedness of the states of the 
closed-loop system, even during the learning phase. The ISS result together with a convergent learning-algorithm 
eventually leads to a bounded output tracking error, which decreases with the decrease of the estimation error. We 
believe that, one of the main advantages of the proposed controller, comparatively to the existing model-based 
adaptive controllers, is that we can learn (estimate) multiple uncertainties at the same time even if they appear 
in the model equation in a challenging way, e.g., linearly dependent uncertainties affecting only one output, or 
uncertainties appearing in a nonlinear term of the model, which are well known limitations of the model-based 
estimation approaches. Another advantage of the proposed approach, is that due to its modular design, one could 
easily change the learning algorithm without having to change the model-based part of the controller. Indeed, as 
long as the first part of the controller, i.e., the model-based part, has been designed with a proper ISS property, 
one can ‘plug into it’ any convergent learning model-free algorithm, as demonstrated here by using two different 
learning approaches. We reported in this short paper some preliminary results about using GP-UCB in a modular 
adaptive control setting. In a longer journal version of the work, we will report more detailed comparisons 
between the MBS-based adaptive controller, the GP UCB-based controller (for example in a more realistic noisy 
environment), and some existing model-based ‘classical’ adaptive controllers, e.g., as found in [1]. 
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